Restructuring Multilingual Web Sites
نویسندگان
چکیده
Current practice of Web site development does not address explicitly the problems related to multilingual sites. The same information, as well as the same navigation paths, page formatting and organization, are expected to be provided by the site independently from the chosen language. This is typically ensured by adopting personal conventions on the way pages are named and on their location in the file system. Updates are then performed manually and consistency depends on the ability of the programmers not to miss any impact of the change. In this paper an extension to XHTML, called MLHTML (MultiLingual XHTML), is proposed as the target representation of a restructuring process aimed at producing a maintainable and consistent multilingual Web site. MLHTML centralizes the language dependent variants of a page in a single representation, where shared parts are not duplicated. Existing sites can be migrated to MLHTML by means of the algorithms described in this paper. After classifying the pages according to their language, a page alignment technique is exploited to identify corresponding pages and to eliminate inconsistencies. Transformation into MLHTML can then be achieved automatically.
منابع مشابه
EuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites. The corpus contains over 3 million documents written in more than 20 different European languages...
متن کاملTowards a Multilingual Ontology for Ontology-driven Content Mining in Social Web Sites
Social Semantic Web aims at combining approaches and technologies from both Social and Semantic Web. While Social Web sites provide a rich source of unstructured information, what makes its automatic processing very limited, Semantic Web aims at giving a welldefined meaning to the Web information, facilitating its sharing and processing. Multilinguality is an emergent aspect to be considered in...
متن کاملA Model of Versioned Web Sites
In this paper we present a model of versioned web sites which is aimed at building a web site configuration. The web site configuration is a consistent version of the web site and serves for navigation purposes. We exploit the fact that the versioning of web sites is in many aspects similar to versioning of software systems (and their components). On the other hand, specific characteristics rel...
متن کاملAdaptive, Multilingual Named Entity Recognition in Web Pages
Most of the information on the Web today is in the form of HTML documents, which are designed for presentation purposes and not for machine understanding and reasoning. Existing web extraction systems require a lot of human involvement for maintenance due to changes to targeted web sites and for adaptation to new web sites or even to new domains. This paper presents the adaptive, multilingual n...
متن کاملBuilding a Social Media Digital Library: Collection, Management, and Analytics
In this talk I will present the University of Arizona Artificial Intelligence Lab’s recent research in Dark Web, Geopolitical Web, and Business Analytics. Based on funding from the NSF and several other US agencies, the AI Lab has developed techniques for collecting, managing and analyzing largescale multilingual and multimedia social media contents of relevance to social, geopolitical, and bus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002